Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Image caption generation model with convolutional attention mechanism

HUANG Youwen, YOU Yadong, ZHAO Peng

Journal of Computer Applications 2020, 40 (1): 23-27. DOI: 10.11772/j.issn.1001-9081.2019050943

Abstract （422）

PDF （810KB）（514）

Save

The image caption model needs to extract features in the image, and then express the features in sentence by Natural Language Processing (NLP) techniques. The existing image caption model based on Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) have the problems of low precision and slow training speed during the extraction of key information from the image. To solve the problems, an image caption generation model based on convolutional attention mechanism and Long Short-Term Memory (LSTM) network was proposed. The Inception-ResNet-V2 was used as the feature extraction network, and the full convolution operation was introduced in the attention mechanism to replace traditional full connection operation, reducing the number of model parameters. The image features and the text features were effectively fused together and sent to the LSTM unit for training in order to generate the semantic information to caption image content. The model was trained by the MSCOCO dataset and validated by a variety of evaluation metrics (BLEU-1, BLEU-4, METEOR, CIDEr, etc.). The experimental results show that the proposed model can caption the image content accurately and perform better than the method based on traditional attention mechanism on various evaluation metrics.

Reference | Related Articles | Metrics